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Abstract 

I To help understand various reproducing kernels used in applied sciences, we investigate the 

inclusion relation of two reproducing kernel Hilbert spaces. Characterizations in terms of feature 
maps of the corresponding reproducing kernels are established. A full table of inclusion relations 
among widely-used translation invariant kernels is given. Concrete examples for Hilbert-Schmidt 
kernels are presented as well. We also discuss the preservation of such a relation under various 
• operations of reproducing kernels. Finally, we briefly discuss the special inclusion with a norm 

I equivalence. 
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in ■ 1 Introduction 
l> ■ 

I Reproducing kernel Hilbert spaces (RKHS) are Hilbert spaces of functions on which point evaluations 

' are always continuous linear functionals. They are the natural choice of background spaces for many 

■ applications. First of all, thanks to the existence of an inner product, Hilbert spaces are the normed 

vector spaces that are well-understood and can be handled best. Secondly, the inputs for many 
application-oriented algorithms are usually modeled as the sample data of some desirable but unknown 
function. Requiring the sampling process to be stable seems to be a necessity. Mathematically, 
^ , this is synonymous with desiring point evaluation functionals to be bounded. For these reasons, 

^ I RKHS are widely applicable in probability and statistics [21 [19] , dimension reduction [9] , numerical 

study of differential equations [5l [10], generalizations of the Shannon sampling theory [13\ [26], and 
approximation from scattered data [22]. Moreover, an RKHS possesses a unique function, named a 
reproducing kernel, which represents point evaluations on the space. Reproducing kernels are able to 
measure the similarity between inputs and could save the calculation of inner products in a feature 
space [T7]. This gives birth to the "kernel trick" in machine learning and makes RKHS the popular 
underlying feature spaces for applications in the field. As a result, reproducing kernel based methods 
are dominant in machine learning [H \7\ [TT} \TE[ [2T] . 

Despite the wide applications of RKHS, there are some important theoretical issues that are not 
well- understood. This paper is devoted to the inclusion relation between RKHS, that is, given two 



'School of Mathematics and Computational Science and Guangdong Province Key Laboratory of Computational 
Science, Sun Yat-sen University, Guangzhou 510275, P. R. China. E-mail address: zhhaizh2@sysu.edu.cn. Supported in 
part by Guangdong Provincial Government of China through the "Computational Science Innovative Research Team" 
program. 

^Department of Mathematics, Syracuse University, Syracuse, NY 13244, USA. E-mail address: lzhao04-@syr.edu. 
Supported in part by US Air Force Office of Scientific Research under grant FA9550-09-1-0511. 



1 



reproducing kernels, we are interested in whether the RKHS of one reproducing kernel is contained 
by the RKHS of the other. The clarification of this problem is helpful to understand the structure of 
RKHS and hence is contributive to the theory of reproducing kernels pQ. For instance, the relation 
is needed in building a multi-resolution decomposition of RKHS. Besides, the study could provide 
guidelines to the choice of reproducing kernels in machine learning. There are many reproducing 
kernels in the literature. In a particular application, the selection of reproducing kernels is usually 
critical to the success of a learning algorithm. While there are no well-recognized guidelines in making 
such a decision, avoiding overfitting or underfitting is usually the first principle. When overfitting or 
underfitting occurs, a remedy is to change the current reproducing kernel so that the RKHS of the new 
kernel becomes smaller or larger compared to that of the existing kernel. Understanding the inclusion 
relation between RKHS could help achieve such an update of reproducing kernels. 

Three characterizations of the inclusion relation of RKHS were established before 1970s [H [6l [25], 
With the advent of machine learning in 1990s, there has been increasing interest in reproducing kernels 
and RKHS. Many concrete reproducing kernels have emerged in the literature and in applications. 
Most of them can be conveniently represented by a feature map, which was unknown in the past studies 
[UEIES]. The purpose of this paper is to provide a systematic study of the inclusion relation of RKHS 
with focus on the concrete examples of RKHS appeared in machine learning. Recent references [231 [23] 
studied the embedding relation of RKHS, that is, an equal norm requirement is imposed. As shown by 
the examples therein, the requirement that two RKHS share the same norm on the smaller space might 
be demanding and rules out many commonly-used RKHS. For example, the RKHS of a Gaussian kernel 
can not be properly embedded into the RKHS of another translation invariant reproducing kernel of a 
continuous type. By relaxing the requirement, we shall see more applications and have more structural 
results. 

The outline of the paper is as follows. We shall discuss characterizations of the inclusion relation 
in the next section. Sections 3 and 4 are devoted to the investigation of concrete translation invariant 
and Hilbert-Schmidt reproducing kernels, respectively. Particularly, we shall establish a full table of 
inclusion relations among popular translation invariant reproducing kernels in Section 3. In Section 5, 
we discuss the preservation of the relation under various operations of reproducing kernels. In the last 
section, we shall briefly discuss the special inclusion relation where a norm equivalence is required. 

2 Characterizations 

We start with introducing some basics of the theory of reproducing kernels [T]. Let X be a prescribed 
set, which is often referred to as an input space in machine learning. A reproducing kernel (or kernel 
for short) K on X is a function from X x X to C such that for all finite pairwise distinct inputs 
X := {xj : j £ N.„} C X, the kernel matrix 

K[x] := [K{xj,Xk) ■.j,k£ Nn] 

is hermitian and positive semi-definite. Here, for the simplicity of enumerating with finite sets, we 
denote for each n G N by N„ := {1, 2, . . . , n}. A reproducing kernel K on X corresponds to a unique 
RKHS, denoted by Hk, such that K{x, •) G Hk for all x £ X and 

fix) = if, K{x, ■))nj, for aU / G H,^, x G X, (2.1) 

where (•, ■)'Hk denotes the inner product on Hk- There is a characterization of reproducing kernels in 
terms of feature maps. A feature map for a kernel iiT on X is a mapping from X to another Hilbert 
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space yV such that 

K{x,y) = mx),^y))w, x,y G X. (2.2) 
The space W is call a feature space for kernel K. One observes from (j2.ip that 

K(x,y) = (K(a;,-),i^(y,-)W> 2;,yGX 

Thus, ^(x) := K{x, ■), x € X and W := T-Lk is a pair of feature map and feature space for K. The 
RKHS of a reproducing kernel can be easily identified once a feature map representation is available. 
The following result is well-known in machine learning community [14^ [T71 [23] . 

For a feature map $ : X — )• W, we shall denote by the orthogonal projection from W onto the 
linear span span$(X) of ^{X). 

Lemma 2.1 If K is a kernel on X given by 12. ^) by a feature map $ from X to W then Hk = 
{(<&(•), m)w : u G W} with the inner product 

{u,(^{-))y^,{v,^{-))y^)n,, = {P<i,U,P^v)y^, W G W. (2.3) 

In particular, if span<I>(X) is dense in W then Hk is isometrically isomorphic to W through the 
linear mapping (n,<l>(-))yv u. 

As an example, we look at the sine kernel 

N / N sin7r(xi — y,) 
K{x,y)= smc{x-y):=Y[ ^, x,y£R'^. 

It can be represented as the Fourier transform of ^ k ■k]'^^ where xa is the characteristic function 

v27r ^ ' ^ 

of a subset ^ C M'^. In this paper, we adopt the following forms of the Fourier transform / and the 
inverse Fourier transform / of a Lebesgue integrable function / G L^(M'^) 

f{0:=(^y [ /(x)e-*(^'«)dx, f{^):=(^Y [ f{x)e'^-^^'^dx, ^ G M'^. 
Here (x, ^) is the standard inner product on M*^. Thus, one sees that 

sine {x-y) = / e-^(«'^-^)d^, x,y e R''. (2.4) 

Thus W := L2([-7r,7r]'^) and $(x) := (-^)'^e-*(5'^), x G M'^ satisfy ([22]). Lemma O tells that -^/^ is 

the space of continuous square integrable functions on M'^ whose Fourier transforms are supported on 
[— 7r,7r]'^ and the inner product on T-Lk inherits from that of L'^{R'^). This is well-known. We use it to 
illustrate the application of Lemma 12.11 

Given two kernels K,G on a prescribed input space X, the corresponding RKHS HkjHg can 
usually be identified by Lemma \TJ\ The theme of the paper is the set inclusion relation Hx ^ ^^G■ 
As point evaluations are continuous on RKHS, it was observed in [Tj that if Tlx ^ Hg then the 
identity operator from Hx into is bounded. We shall denote by /3{K, G) the operator norm of 
this embedding. A characterization of Hk ^ was also established in [1]. Following [1], we write 
K G \i G — K remains a kernel on X. 
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Lemma 2.2 Let K, G be two kernels on X, T-Lk ^ T~{-G if (^nd only if there exists a nonnegative 
constant A > such that K <^ AG. 

Provided that T-Lk ^ T^g, we shall denote by X{K, G) the infimum of the set of positive constants 
A such that K <^ AG. If T-Lk ^ then we make the convention that \{K, G) = +00. We first make 
a simple observation about the two quantities /3{K,G) and A(-ftr, G). 

Proposition 2.3 Let K, G be two kernels on X with T-Lk ^ 'Hq then (3{K, G) = \J X{K, G) and 
K < \{K, G)G. 

Proof: It was proved in p] that for two kernels K,L on X, K <^ L \i and only if T-Lk Q 'Hl and 
II/II-Hl ^ II/IIwa' f ^ T~(-K- Note by Lemma \27l\ that T-Lq and Hag share common elements and 

for all / G Hg that 

\\f\\na = '^\\f\\n,a- 

Combing these two facts, we get that for all A > that K <C AG if and only if T-Lk ^ T^g and 
ll /II^G < \ ^\\f\\HK for all f eTix- Thus, if <C AG then P{K, G) < VA. It follows that P{K, G) < 
X{K, G). On the other hand, if /? > I3{K, G) then there exists some / G Hk for which either / ^ 'Hg' 
or ll/II^G > /3||/||?^A. It implies that K <^ jS'^G does not hold. As a consequence, X{K,G) < f3'^. We 
hence have that ^y^^^{K^7T) < (3{K, G), leading to the equality 

/3(K,G) = VA(i^,G), 

which in turn implies that K <^ X{K, G)G. □ 

We next present another characterization of the inclusion relation in terms of feature maps of 
reproducing kernels. 

Theorem 2.4 Let K, G be two kernels on X with the feature map : X — )■ Wi and ^2 ■ X ^ W2, 
respectively. If span$i(X) = Wi and span ^2{X) = W2 then Hk ^ ^G if o-nd only if there exists a 
bounded linear operator T : W2 — )• Wi such that 

T<^2ix) = x£X. (2.5) 

Moreover, the inclusion is nontrivial if and only if the adjoint operator T* of T is not surjective. 

Proof: The result can be proved by similar arguments as those in Theorems 6 and 7 of [23]. □ 

By the above theorem, the particular choices 

Wi := Hk, $i(x) := K{x, •), W2 := ?^g, M^) = G{x, •), xeX 

yields that Hk ^ T^g if and only if there exists a bounded operator L : Hg — >■ Ha' such that 
LG{x, •) = K{x, ■) for all x ^ X. We remark that this result in the special case when X is a countable 
dense subset of M'^ was proved in [6] . 



4 



3 Translation Invariant Kernels and Radial Basis Functions 



Translation invariant kernels are the most widely-used class of reproducing kernels on the Euclidean 
space. A kernel K on is said to be translation invariant if 

K{x — a,y — a) = K{x, y) for all x,y,a £ M"^. 

There is a celebrated characterization of continuous translation invariant kernels on M'^ due to Bochner 
[3]. The result is usually referred to as the Bochner theorem. Denote by B{M.'^) the set of all the finite 
positive Borel measures on M'^. The characterization states that continuous translation invariant 
kernels on M'^ are exactly the Fourier transform of finite positive Borel measures in i3(M'^). Thus we 
shall consider the inclusion relation T-Lk ^ for two translation invariant kernels K, G of the form 

K{x,y)= [ e'^^-y'^UfiiO, x,2/GM^ (3.1) 

and 

G{x,y)= [ e*(^-^'«)di/(C), x,y gR'^, (3.2) 

where fi,!^ € i3(M'^). 

Let /u, z/ be two finite Borel measures on a topological space Y. Recall that /.t is said to be 
absolutely continuous with respect to v, denoted as fj, <^ v, ii fi vanishes on Borel subsets of Y with 
zero ly measure. When ^ <^ dn/dv is a Borel measurable function on Y such that 

fJ-{A) = / —{x)dv{x) for all Borel subsets A ^Y. 
J A dv 

We denote by L^{Y) the space of Borel measurable functions on Y with the norm 

||/||ioo(y) := inf{M > : y{{t G Y : \f{t)\ > M}) = 0} < +oo. 
For later use, we also denote by L'^{Y) the Hilbert space of Borel measurable functions on Y such that 

WfhliY) ■■= {^j^\f{t)\''dv{t)j'\+^. 

Proposition 3.1 Let K,G be two continuous translation invariant kernels on M'^ given by Ii3.1\) and 
[TE) . Then Uk ^ ^G if and only if fi <^ u and d^i/dv G L^f (M"'). In the case that T-Lk ^ ^Gi 

(3.3) 

Lg°{Rd) 

Proof: By Lemma [2.2| T-Lk ^ ^G if find only if there exists some A > such that AG — A' is a kernel on 
W^. Note that for all A > 0, AG — K is still translation invariant. Therefore, by the Bochner theorem, 
K < AG if and only if Az^ — G i3(M'^), which happens if and only if /x <C z^ and dfi/du is bounded by 
A almost everywhere on with respect to z/. We hence get that Hk ^ ^G if and only if /i ^ z/ and 
dn/du G L'^{W^). When /i < i/ and dfi/di^ G L^(M'^), it is clear that holds. □ 

We pay special attention to the situation when the Borel measures in ()3.ip and (13. 2p are absolutely 
continuous with respect to the Lebesgue measure. In this case, by the Radon-Nikodym theorem, K, G 
are the Fourier transform of nonnegative Lebesgue integrable functions on M'^. 



X{K, G) 



dfi 



dv 
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Corollary 3.2 Let u,v be nonnegative functions in L^{M.'^) and let K,G be defined by 

K{x,y)= [ e'^'^-y^^K{^)d^, G{x,y)= [ e'^'^-y^^hiOd^, x,yeM.'^. (3.4) 



Then T-Lk ^ 'Hg if and only if the set {t G M'^ : u{t) > 0, v{t) = 0} has Lebesgue measure zero and u/v 
is essentially bounded on {t : v{t) > 0}, in which case X{K,G) equals the essential upper bound 
of u/v on {t G M*^ : v{t) > 0}. In particular, if v is positive almost everywhere on M"^ then T-Lk ^ 
if and only if u/v G L°°(M'^), in which case X{K,G) = \\u / v\\ (^^dy 

An important class of translation invariant kernels on M'^ are given by radial basis functions. Those 
are reproducing kernels of the form 

Ka{x,y)=g{\\x-y\\), x,yGM^ (3.5) 

where 5 is a single-variate function on ]R_|_ := [0, +00) and || • || is the standard Euclidean norm on M.'^. 
The following well-known characterizations of kernels of the form (|3.5p are due to Schoenberg [15] . 

For each d G N, denote by dio^ and lo^ the area element and total area of the unit sphere of M'^, 
respectively. Also set 

^d{\x\) := — [ e^(-'«)(ic^,(0, X G M^. 
Ju\\=i 

Lemma 3.3 Let g be a function on Then i3. 5)) defines a reproducing kernel on M.'^ if and only if 
there is a finite positive Borel measure n on M+ such that 



Kd{x,y)= nMx-y\\)dfi{t), x,yGM^ (3.6) 
Jo 

Furthermore, equation 113. 5\) defines a reproducing kernel Kd on M'^ for all d € N if and only if 

POD 

Kd{x,y)= e-^ll^-^^ll'd^t), x,yGM'^ (3.7) 



for some finite positive Borel measure ^ on M+. 

Notice that both span {il£;(tr) : r > 0} and span{e~*^ : r > 0} are dense in Co(M+), the space 
of continuous functions on vanishing at infinity equipped with the maximum norm. By this fact 
and Lemma 13. 3| one may use arguments similar to those in the proof of Proposition 13.11 to get the 
following characterizations of the inclusion relation of RKHS of kernels of the form (j3.5p . 



Proposition 3.4 Let be two finite positive Borel measures on M+, let be given by \3. 6]) and 
set 

poo 

Gd{x,y):= nMx-y\\)diyit), x,yGM'^. (3.8) 
Jo 

ThenTiKd — if and only if fi <^ v and d^/dv G L^(M-|_), in which case X{Kd, Gd) = \\d^/ dv\\iooi^^^y 
If Kd is given by \3. 7^ and Gd is defined by 

/•oo 

Gdix,y)= e-^ll^-J'lI'dKt), x,yGR'' (3.9) 



then Hk^ ^ Hg^ for all d & N and {X{Kd,Gd) ■ d G N} is bounded if and only if ^ <^ v and 
dji/du G L^(M+), in which case su];){X{Kd,Gd) : d £ N} = \\dfi/di^\\ioo(^^y 
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One may specify statements in the above proposition to the case when /x, v are absolutely continu- 
ous with respect to the Lebesgue measure on to get results similar to those in Corollary 13.21 which 
we shall not state here. 

We next turn to the main purpose of this section, which is to explore the inclusion relations among 
the RKHS of six commonly used translation invariant kernels in machine learning and other areas of 
applied mathematics. To apply the characterizations established above, we present those kernels in 
the form they appear in the characterization of Bochner or Schoenberg: 

- the Gaussian kernel 

G-,(x,y) = expf-^^^') = / e'^'^-^^'-h^mi. x,2/GM^7>0. (3.10) 

where 

■■= J exp( —), ^ G M . 

- the ^^-norm exponential kernel 

E^,{x,y) = ei^p(-^^^^^) = [ e^(^-^'«)(^,,(eK, x,y ^ R", > 0, (3.11) 
where ||x||i := Yl'j=i l^il' ^ — i^j '• J ^ ^d) £ '^'^ and 



the ^^-norm exponential kernel 

£^,{x,y) = exp(-||x - y||) = [ e'^^-y^^~^^„,{i)di, x,y G 02 > 0, (3.12) 



where 



r(^) 



^^M--=^l^- „, ,,,,.+1 . CGM^ (3.13) 

Here, V denotes the Gamma function and the Fourier transform is identified by the Poisson 
kernel (see, for example, [20], page 61). 

the inverse multiquadrics 

^P^^^y) ■■= n^u ^ ll2^/3 = / e'^""''^^m;3(0(ie, x,y G (3 > 0, (3.14) 

where 

This formulation can be obtained by combining Theorem 7.15 in [22] and the Fourier transform 
of the Gaussian function. 
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the B-spline kernel 

d „ 

Bp{x, y) := rr Bp{xj - y^) = / e'^^-y^^pi^d^, x, y e p G 2N, (3.16) 

sin(-) 

where Bp denotes the p-th order cardinal B-spline, and with sine i (t) := — t E M, 

2 2 

d 



the ANOVA kernel 

/ 1^ „, 12 



:= ^exp(^-^^-^) = e*(--^'«)a,(e)fie, y G r > 0, (3.17) 



where 

MO-^(Eexp(-f)|. « 



Among those kernels, the Gaussian kernel, the £^-norm exponential kernel, and the inverse mul- 
tiquadrics are radial basis functions. We also give their representation by the Laplace transform 
below: 

- the Gaussian kernel 

G^(x,?/) = exp(^-^^^^^ = ^°°e^ll^-s'll'*(i(5^-i(t), (3.18) 

where 6t denotes the unit measure supported at the singleton {t}. 

— the £^-norm exponential kernel 

\x-y\\\ 1 r .-\\x-v\\H^,.^, 1 X 1 



^.,(.,y)=exp^-^J=^y^ e-li-ll^exp(-^)^dt, x,yGM^ (3.19) 
This equation is derived from the identity (see [20], page 61) that 

vr In \/s 



the inverse multiquadrics (see [22], page 95) 

Mpi^^y) = TT-l^ n2Ts = Tvm / e~\^^'y\\"H^-^e-'dt, x,y G R^. (3.20) 



(1+ ||x-y||2)/3 r(/3) 

As a straightforward application of Corollarv l3.21 we have the following inclusion relations between 
the RKHS of kernels of the same kind. 
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Proposition 3.5 The following statements hold true: 
(1) For < 71 < 72, Hg^, C Hg^^ with 



hut Ug^^ ^ Ug,^ ■ 
(2) For < cJi < CJ2, Ue,^ = Ue,^ with 



X{E„-^,E„^) — A(-Eo-2)-^cri) — ( — 



(3) For < cTi < CT2, Ue,^ = Ue,^ with 



(4) For p,q £2N with p < q, Ub, ^ Ub^ with X{Bg, Bp) = 1, but Hb^ ^ Ub, ■ 

(5) For < n < T2, Ua^^ Q Ua^, with \{Ar^,Ar,) = ./^, hut -HAr, ^ Ua,^ ■ 



The inclusion relation for the RKHS of two inverse multiquadrics is more involved and is separated 
below. 

Theorem 3.6 Let (3i,l3 be two distinct positive constants. There holds T-Lup^ ^ ^^^^ '^^^^ 

i<pi<P2. 

Proof: Suppose first that /3i > /32- By the same technique used in Theorem 6.13, |22j and equation 
(|3.15p . one obtains for all /3 > that 



_2i 

( " 

where Ki,, G M is the modified Bessel functions defined by 



/•oo 

K^{r):= e-"'=°'^^*cosh(i/t)dt, r > 0. 
Jo 



We use the estimates (see, [22], pages 52-53) about Ki, that there exists a constant Ci, depending on 
only such that 



and that 



K4r)>Cu^, r > 1 (3.22) 



K^{r) < exp 1^-^ ], r > 0. (3.23) 



Combining equations (j3.2ip . (|3.22p . and (j3.23p . we obtain for /3i > /32 that 

d\2 



"^>'^-C^'^\lif'-'^^^.(-'^]. II«II>1. (3.24) 



/32 



Since the right hand side above goes to infinity as ||^|| — oo, we get by Corollary 13. 21 that "Wa/^^ ^ "Hm, 
when f3i > (^2- 

By monotone convergence theorem, we have by equation (j3.15p for all /3 > that 

f +00, if/3<f, 
limm^(e) = <^ 1 r(/3^f) . (3.25) 

Therefore, if /3i < f < /52 then w-/?! (C)/''^/32 (0 is unbounded on a neighborhood of the origin. As a 
consequence, "Ha/^^ ^ '^^h2 ™ ^'^^^ case. 

Suppose that ^ < /3i < /32. Then by (I3.25p . rni3^{^) /mp^{Q is bounded on a neighborhood of the 
origin. Also, by p.24p . 

lim ^=0. 

As "i/3j(^)/m/32(C) is continuous on M"' \ {0}, it is hence bounded therein. By Corollary [321 T^Mp^ ^ 
Ump^ when f < /3i < /Jg. 

We now discuss the last case that /3i < /32 < f • We shall show that in this case T-Lm^^ ^ ^AZ/^j ^■^ 
proving that 'Ti/3i(0/"^fe(0 is unbounded on a neighborhood of the origin. To this end, let ||^|| < 1 
and use the change of variables t = in (j3.15p to get that 



_ ||t||2(/3i-/32)£(^7o 



Thus, if /32 < f then we have for ||^|| < 1 that 



> iieih 



2 1 exp ( — — s\ ds 



m,,[i)-- r(A) r,,ft-Mexpf-i-U» 



4s^ 

The right hand side above is unbounded when || — )■ 0. When /32 = |, we estimate that 

A change of variables = t then yields for ||^|| < 1 that 

/ -e-\m'ds= -e-'dt< -dt+ — dt = -21n(||e||)+ / — dt. 
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Combining the above two equations with (j3.26p yields that 
> ll^ll 







2 1 exp \ — s\ ds 



1 

4^ 



The right hand side above goes to infinity as ||.^|| — t- 0. The proof is complete. 



1 1 r°° e"* 

-e~-sds+ / — dt-21n(||e||) 
s Ji t 



II^IKi- 



□ 



The main purpose of this section is to explore the inclusion relationships among RKHS of different 
kinds of translation invariant kernels given above. We present the results in the form of a table. 

Theorem 3.7 Let p G 2N and 7, cri, (T2, /3, r be positive constants. The following inclusion relations 
of RKHS hold true. 













T~l.Mfj 










C 


c ifjp>d+l 






■Hg, 






C 


c 


c 


^ ijJl>r 


-He., 








^ ifd>2 






-He., 






^ ifd>2 






















■HAr 




^ ifd>2 











We break the task of proving this result into several steps as follows. 

(i) For any dimension d G N and parameters p G 2N, > 0, Hb ^ "Hk and 'Hb for 
K = G^ OT K = Ar. 

Proof: We first discuss the case when K = Gj. It is clear that bp/g^ is unbounded on W^. By 
Corollarv \'6.2\ H-Bp ^ ^G^- On the other hand, bp possesses zeros on while g-^ is everywhere 
positive. As they are both continuous, there does not exist a positive constant A > such that 
Q'yiC) ^ "^^piO for almost every i G W^. As a consequence, we obtain by Corollary 13.21 that 
T~Lg-, ^ ^_Bp- The other case when K = Aj- can be handled in a similar way. □ 

(ii) For any d G N, f72 > and p G 2N, Ue^^ ^T^Bp- There holds Ubp ^ T-Le,^ if and only \ip>d+ 1, 
in which case 

2P-'' + 



\{Bp, <5o-2) < 



d-1 -p(d±l\ 



(3.27) 



Proof: The function ^0-2 iii (|3.12p is continuous and positive everywhere on W^. By arguments 
used before, Tis^^ ^ H-Bp- Assume that p < d + 1. We choose ^1 = (2n + 1)-k and (,j = for 
j > 2 to get that bp{^) = 0{n-P) while V^alO = 0(n-('^+^)) as n tends to infinity. Therefore, 
bp/ip(T2 is unbounded on M*^, implying that Hbp ^ ^£<t2- 
Suppose that p> d+1. If ||^||oo := ™ax{|^j| : j G N^} < 1 then 



(27r 



I TT 2 



cm 



d+l 



(l + aid)- 
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It follows that 



■I 

When llClloo > 1, 



1 TP 



oo 



which implies hy p > d + 1 that for ||^||oo > 



d+l d+1 , , d+1 



2 



-(2^)'ir(^) afiieiiSo - {2cj^^Yv{^)\mi 

d+l 

TP vr 2 „ d+l 

< 7 — J7-i-(l + crid)~. 

By Corollary 13.21 the above inequality together with (|3.28p proves (j3.27p . □ 
(iii) For any d G N, fii > and p G 2N, TiE^^ t^Bp- There holds Ub^^ Ue^^ and 



X{Bp,E„,)<2''\ai + —J . (3.29) 

Proof: The relation % e^^ '^'Hsp follows from that ip^i is positive and continuous everywhere on 
M'^. Using an estimate method similar to that in (ii), we get that 

(sine i(t))P(l + 0-?*^) < (sine lit)? (1 + aft^) <4(l + cr?) for ah t G M, 

2 2 

which combined with the explicit form of bp and yjo-i leads to (j3.29p . □ 
(iv) For any d G N, fii > and 7 > 0, TiE^^ ^ no-,- There holds Hg-, ^ "He,^ and 

XiG„E^,)< (^max(l, ^) . 



(3.30) 



Proof: It is clear that ^Pa^/g'y is unbounded on M.'^. By Corollary 13. 2[ "He^ ^ '^G^- On the other 
hand, one has that 

which together with the observation that 

(l+^72c|)<max(l,-i)exp(^ijlj, G M 
proves ([H3U|) . □ 
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(v) For any d G N, di > and 7 > 0, Us^^ ^ Ug.,. There holds Ug-, Q Ue,^ and 

A(G^, < (^max(l, ) j j ^^i±Ty- (3-31) 

However, \{G^, £^2) does not have a common upper bound as d varies on N. 

Proof: As 1/^0-2/57 is clearly unbounded on W^^ ^£^2 ^ ^G-y- We then estimate that for all ^ G M*^, 



)— < (^max(l, ^ j < (^niax(l, ^ j exp(- 



4 



which immediately implies that 57 (C) (0 is bounded by the right hand side of ()3.3ip . Equa- 
tion p.3ip now follows from Corollarv 13.21 

To prove the third claim, we use the Laplace transform representations (j3.18p and (j3.19p . One 
observes that the Gaussian kernel G-y corresponds to the delta measure which is singular 

with respect to the Lebesgue measure while is represented by the Borel measure 

^exp(-7-2-):^c^t, 



2(T20F^' 4ait't3/2 

which is absolutely continuous with respect to the Lebesgue measure. Thus, is not absolutely 
continuous with respect to the above measure. By Proposition 13. 4^ \{G^, 8^2) does not have a 
common upper bound as the dimension d varies on N. □ 



(vi) For any d > 2, > and ^2 > 0, Us.,^ ^ ^E., and Ue^^ ^ Ue^^ ■ 

Proof: We first let ii = n and = for j > 2 to get that ^Pa^ii) = 0{n~'^) and ^0-2(0 = 
as n tends to infinity. As d > 2, <^o-i (C) /''/'(T2 (0 is unbounded on M'^, implying 
that Ue,^ ^ Ue^^- The choice = n for ah j G yields that ^^AC) = 0{n-'^'^) and 
^^0-2(0 = 0(n~('^+^)) as n — 00. Therefore, ^^0-2 (O/'/^o-i (0 is unbounded on W^. It implies that 
■He^.^nE^,- □ 



(vii) For any d > 2, di, (T2, r > 0, T-LAr ^ 'Hk and T-Lk ^ T-iAr for either K = Efj^ or K = E^^. 

Proof: We discuss K = only as the other case can be dealt with similarly. Choosing = n 
for all j G Nrf yields that 990-1 (0 /'^t (0 — )• 00 as n — )• 00. The other choice ^1 = n and = for 
j > 2 tells that ariCj/^ai (C) — ^ 00 as n — )• 00. Therefore, neither ipfj-^jar nor arj^ax is bounded 
on M^. The result now follows from Corollarv 13.21 □ 



(viii) For any d > 2, 7,r > 0, ^ ^G-y- There holds Hg-^ ^ "^A-r if and only if 7 > r, in which 

case 

Proof: That 7^^^ ^ "^G can be proved in a way similar to that in (vii). If 7 < r then we set 
= n for all j G to see that fl'7(0/'^r(0 — >• 00 as n — )• 00. Thus, Hg-, ^ in this case. 
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Suppose that 7 > r. We get for all G M*^ that 

9,(0 _ V^' exp(-«) 



2 , ' 



a.(e) V^(2V^)^-i^J^^exp(-^ 
which together with the observation that 

exp(-^^) < exp( ^) for all j G 

implies that 

^ < ^/^^^^ ^ for all e G M'^. 
As the equality is achieved at ^ = 0, we obtain (|3.32|) . □ 

(ix) For any d G N, (72, /3 > 0, Tis^^ ^ TiM^- There holds Hm^ C 7^^^^ if and only if /3 > f . 
Proo/; By (I3l3l) and (IXTSl) . we have for all ^ G M"' that 



, _^ rt/3-f-i(l + ^2||^||2)^expf-^-Ad*- (3-33) 

Note that when ||^|| > 1, 

Thus, for llell > 1 

We hence get that 'mp{^) /il^u^iO — ^ as ||^|| — )■ 00. It implies that T-Le^^ ^ T~iMp- 
To prove the rest of the claims, one first sees by the Lebesgue dominated convergence theorem 
that mfs/ipcr2 is continuous on M'^ \ {0}. We also have that ?7i/3(^)/V'o-2(0 ~^ ||^|| — > 00. 
For these two reasons, jn^/^o-j is essentially bounded on M'^ if and only if it is bounded on a 
neighborhood of the origin. If /3 > |, we observe that when ||^|| < 1, 



f t^-i-Hi + ^m')"^ -p("^ - < (1 + ai)^ t 



^~i~^e~*dt < +00, 



which implies that 771^/^/^0-2 is essentially bounded on M when (3 > We hence get by Corollary 
1/3 ^ ^^^,72 i'^ ^'^i^ case. When /3 < |, 



T2]that C 7^^^ in this case. When /3 < ^, by the monotone convergence theorem, 



lim rt^~i-\l+alU\\^)'^exp(-^^-t]dt= H t^-i-^e-'dt = +00. 
It follows from the above equation that % Ma ^ T~i£a^ when /3 < | . □ 
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(x) For any d E N, cJi,/3 > 0, rtE^^ ^ 'Hup- There holds Hup ^ 'He^^ if and only if /3 > |. 
Proof: The proof is similar to that for (ix). □ 

(xi) For any d G N, p G 2N, /3 > 0, Ub^ ^ ^A/^ and Um^ ^ T^b^- 

Proof: As m/3 is positive and continuous on \ {0} while hp has zeros on R*^ \ {0}, Hm^ ^T~LBp- 
That T-Lsp^ T~LMp can be proved by arguments similar to those in (ix). □ 

(xii) For any d G N, 7,/3 > 0, %Mp ^ T~Lg^ but T-Lg^, ^ "^M^- The quantity \{G^,Mp) does not have 
a common upper bound as d varies on N. 

Proof: We start with the observation that 



diii) T{l3)-fi Jo 



4 / " V At 



r(/3)7 



2 V 4 4t 



7 



1 



> ^expl^-^l / t^"2-ie"*dt. 



r(/3)7^ 

Therefore, 'mp{£,) / g.~f{£) tends to infinity as ||^|| — )• 00. Consequently, T-Lm^ ^ ^g^- 
We also notice by the monotone convergence theorem that 

lim ^ = ^-^ rtM-V*dt>0. 
IISIKo g^{i) r(/3)72 Jo 

As is continuous and positive everywhere on \ {0}, the above two estimates imply that 

there exists some positive constant A such that 

Aforan^GM'^\{0}. 

57(4) 

We hence conclude that T-Lg^ C TiMp- Recall ()3.18p and (I3.20p . Since and Mjj are respec- 
tively represented by measures singular and absolutely continuous with respect to the Lebesgue 
measure, A(G-y, Mp) does not have a common upper bound for d G N. □ 

(xiii) For any d G N, r,/3 > 0, Ua^ ^ and ^ Ua,- 

Proof: Firstly, we see for the choice .^i = n, = 0, j > 2 that 



lim ?7i^(^) = while lim ar{C) 



2^ 



As a result, HAr ^ '^Mp- Secondly, arguments similar to those in (xii) shows that for the choice 
= n,j€ Nd 

hm 7— = +CX), 

n->-oo ar[q) 

which implies that T-Lm^ ^ T^At ■ ^ 
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We close this section with the sine kernel (j2.4p . 
Corollary 3.8 There holds for a// 7 > and d G N that 



^2 , 



A(sinc,G^) = —, A(sinc,^^) = . (3.34) 

(771)2 ^ ^vr'^d^r 

Consequently, 7^ sine ^ /o?" -f^ = E(n,£a2> o,nd Mj^. 

Proof: Equation (j3.34p follows from a straightforward calculation. □ 

4 Hilbert- Schmidt Kernels 



By Mercer's theorem [12], Hilbert-Schmidt kernels represent a large class of reproducing kernels. They 
were recently used to construct multiscale kernels based on wavelets [1^. We introduce the general 
form of Hilbert-Schmidt kernels. 

Let a be a nonnegative function on N and set a„ := a(n), n G N. We denote by ^^(N) the Hilbert 
space of functions c on N such that 

||c||^2(N) := ( ^a„|c„|^ j < +00. 



Its inner product is given by 

00 

(c, d)^2(N) := ^ anCndn, c,de 4(N). 

n=l 

Suppose that we have a sequence of functions n G N, on the input space X, such that for each 
X G X the function <I>(x) defined on N as 

<l>ix){n) := (pnix), nGN (4.1) 

belongs to ^a(^)- The Hilbert-Schmidt kernel Ka associate with a is given as 

00 

Ka{x, y) := (^>(x), <^{y)i2(n) = ^ a„(^„(x)0„(?/), x,y e X. (4.2) 

n=l 

Now suppose that there exits another nonnegative function b on N such that ^>(x) G ^ft(N) for all 
X £ X. Set 

00 

Kb{x,y) := ($(x),$(y)^2(f^) = ^bnMx)My)^ x,y £ X. (4.3) 

n=l 

We shall characterize T-Lxa — T^Kt in terms of a and b. 

Proposition 4.1 Suppose that b is nontrivial, and span {$(2;) : x G X} is dense in both ^^(N) and 
^^(N). Then — T~(-Kh ^'^'^ '^''^^V ^/ there is a constant A > such that a„ < A6„ for all nGN. In 
this case, 

X{Ka, iffc) = sup I ^ : n G N, 6„ > o| . (4.4) 
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Proof: By Lemma 12. H the space consists of fmictions of the form 

oo 

fc{x) := (c, $(x))^2(N) = J2 CnanJJx), xGX, c e ^^(N) (4.5) 

n=l 

with the norm 

WfcWuKa = Meim- 

Similarly, one has the structure of the space Hki,- 

Suppose that there exists some constant A > such that an < A6„ for all n G N. Let c be an 
arbitrary but fixed element in -^^(N) and set 



C77, ■ " 



0, if ttn = 0, 

otherwise. 



One sees that c G ^^(N) and that (c, $(-))^2(p^) = fc- Thus, fc G T^k^j implying that T-lRa ^ T^Kf 



'■b 

Another observation is that 



^6 

neN,a„^0 



2 

2 



K < sup{a„/6„ : n G N, a„ / 0}||/c|||^ . 



Moreover, for any /c G N with > 0, the particular choice c(n) := 5n,k: ^^ G N, where 6n,k denotes the 
Kronecker delta, yields that 



^2 ^ 



The above two equations together imply by Proposition 12.31 that 

X{Ka,Kb) = /3{Ka,Kbf = sup|^ : n G N, 6„ > o| . 

Conversely, suppose that ^ T^K^- As the embedding operator is bounded, there exists A > 
such that ^ -^II/II'Hk;, for all / G Tixa- For any A: G N with > 0, we still choose c(n) := 6n,k, 

n G N to get from fc G T-Lxb that 6^ > and that 

^2 

k 

which implies that < A6fc. The proof is complete. □ 



Before we give examples of inclusion relations for Hilbert-Schmidt kernels by Proposition 14. 1^ we 
remark that Proposition 14.11 actuallv leads to a characterization of Hilbert-Schmidt kernels. 



Theorem 4.2 Let r be a function on N. Suppose that <l>(x) G £^^,|(N) for all x £ X and span$(X) is 

00 

Kr{x,y) ■.= ^rn(t)n{x)4>n{y), x,y £ X (4.6) 



dense in i? , (N) . Then 



n=l 

defines a kernel on X if and only if rn > for each n G N. 



17 



Proof: The sufficiency is well-known. We prove the necessity by contradiction. Assume that Kr given 
by (j4.6p is a kernel but r^p < for some jo G N. Then we introduce two nonnegative functions a and 
6 on N by setting 

2|r„,|, n/jo, 



and 

2|r„|+r„, n/jo, 



0, n = jo- 

Then it is clear that <I>(x) G iK^) and ^{x) G ^^(N) for ah x e X. Moreover, span<l>(X) is dense 
in and £j(N) as it is in ^^^|(N). Therefore, Ka and Kb are Hilbert-Schmidt kernels on X. Note 

that Kh — Ka = Kr- By the assumption, Ka <^ Kf,. Thus by Proposition 14.11 there exists some A > 
such that a„ < Afe„, for all n G N. Especially when n = jo, we have —rj^ < AO = 0, contradicting that 
rj, < 0. □ 

As an application of the above theorem, we discuss an important and celebrated result which was 
proved before by rather sophisticated mathematical analysis [IB]. Suppose that the power series 

oo 
n=0 

has a positive convergence radius r. Then by Corollary 14.21 or |16j . 

oo 

K(x,y) := ^a„(x,y)", x,yGM^, ||x||, ||y|| < r^/^ 

n=0 

is a reproducing kernel on {x G M'^ : < r^/^} if and only if a„, > for all n > 0. 

We close this section with a few examples that fall into the consideration of Proposition 14.11 We 
shall not state the results explicitly as they would just be repetition of those in Proposition 14.11 

- (Discrete Exponential Kernels) Let n G N be a sequence of pairwise distinct points in M"' and 
let a,b be two nonnegative functions in £^(N). The associated discrete exponential kernels are 
given by 



Ka{x,y):=f2ane'^'"'''"\ Kb{x,y) ■.= Y,Ke'^'"'''-\ ^.V ^ 



pd 

'Jnc ■ ^ ' , X, y c . 
n=l n=l 

Useful examples of discrete exponential kernels including the periodic kernels (see, for example, 
|17j . page 103). We present three instances below. Let 7,0" be positive constants and a > d. 
Define 

G,{x,y) := J2 e^("-^'")e-^ll"ll', x,y G [0,27r]^ 
K{x,y) := e^("-^'")e-'^ll"ll, x,y e [0,27r]^ 

and 

Pa(x,y) := V e^^^-S''")- J-^, x,y £ [0,271^. 

Then by Proposition 14. H we clearly have that Hf, ^Hp ■ 
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— (Polynomial Kernels) Let a,b be two nonnegative functions on N+ := N U {0}. Suppose that 
X]^o ^nz"' and X^^g ^n^^ both have a positive convergence radius and r^, respectively. Then 
the polynomial kernels 

oo 

Ka{x,y) := ^a„(x,y)", 

n=0 

oo 

n=0 

on the input space {x G : ||x|| < min(y^, ^/rh)} satisfy the assumptions of Proposition 14. 1[ 

Especially, we have the following simple observation about finite polynomial kernels. 
Proposition 4.3 (Finite Polynomial Kernels) Let p,q £ N and put 

Kp{x,y):={l + {x,y)y, x,yeR'' (4.7) 

and 

K,{x, y) := (1 + (x, y)Y , x, y G M'^. (4.8) 
Then Tixp ^ 'Hk^ if and only if p < q. When p < q, X{Kp, Kg) = 1. 

5 Constructional Results 

In this section, we discuss the preservation of the inclusion relation of RKHS under various operations 
with the corresponding kernels. We start with some trivial observations from Lemma |2.2[ 

Proposition 5.1 Let Ki, K2,Gi,G2, K,G be reproducing kernels on the input space X. Then the 
following results hold true: 

i-) If^Ki ^ y-Gi and 'Hk2 ^ ^Ga i^en Tixi+Ki ^ ^d+Gz and 

\{Ki + K2, Gi + G2) < max(A(J^i, Gi), A(i^2, G2)). 

ii.) Especially, ifH-Ki andT-Lj^^ are both contained in 1-Lq then 'Hxi+K2 — and 

X{Ki + K2,G) < X{Ki,G) + X{K2,G). 

Hi.) IfH-K ^ T~iG then for all a,b > 0, HaK ^ T~ibG and 

X{aK,bG) = ^X{K,G). 

We next turn to the product of two kernels by first examining the more general tensor product of 
kernels. Let K, G be two kernels on X. The tensor product K G of K,G is a, new kernel on the 
extended input space X x X defined by 

(if® G)(x,y) := K(xi,yi)G(x2,y2), x=(xi,X2), y = (yi, y2) G ^ x X. 

For further discussion, we shall make use of the Schur product theorem [11] , For two square matrices 
A, B of the same size, we denote hy Ao B the Hadamard product of A, B, that is, Ao B is formed by 
pairwise multiplying elements from A and B. The Schur product theorem asserts that the Hardmard 
product of two positive semi-definite matrices is still positive semi-definite. 
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Proposition 5.2 Let Ki,K2,Gi,G2 be kernels on X . I/Uki ^ and%K2 ^ then'HKi(S)K2 ^ 

A(Ki ^2, Gi G2) < A(i^i, Gi)A(K2, G2). 

Proof: For notational simplicity, put Ai := A(i^i,Gi) and A2 := X{K2,G2)- We shall show that 
Ki ® K2 ^ A1A2G1 ® G2 by definition. Let z := {x-' : j G N„} be a finite set of pairwise distinct 
points in X X X. Set zi := {x-( : j G N„} and Z2 := : j € N„}. We observe that 

(Gi®G2)[z] =Gi[zi]oG2[z2], (Ki®K2)[z] =Ki[zi]oi^2[z2]. 

By Proposition 12.31 Ki <^ XiGi and K2 ^ A2G2. As a result, AiGi[zi] — Ki[zi] and A2G2[z2] — K2[z2] 
are both positive semi-definite. We now compute that 

AiA2(Gi ® G2)[z] - {Ki (g) K2)[z] = AiAaGiizi] o Galza] - Ki[zi] o K2[z2] 

= (Ki[zi] + (AiGi[zi] - Ki[zi])) o {K2[Z2] + (A2G2[Z2] - K2[Z2])) - Kl[zi] o K2[Z2] 
= K^[zi] O (A2G2[Z2] - K2[Z2]) + (AiGi[zi] - Ki[zi]) o K2[Z2] 
+ (AiGi[zi] - Ki[zi]) o (A2G2[Z2] - K2[Z2]). 

By the Schur product theorem, the three matrices in the last step above are all positive semi-definite. 
Therefore, Ki (g) K2 <C A1A2G1 (8> G2. The proof is complete. □ 



Corollary 5.3 Let Ki,K2,Gi,G2 be kernels on X. IfT-Lxi ^ ^Gi o,nd T-Lk2 ^ T^G2 then T-LkiK2 ^ 
nG,G2 and \{KiK2,GiG2) < A(Ki,Gi)A(A'2,G2). 

Proof: The result follows from Proposition 15.21 and the observation that K1K2 and G1G2 can be 
viewed as the restriction of Ki <^ K2 and Gi ® G2 on the diagonal of X x X, respectively. □ 

We next discuss limits of reproducing kernels. It is obvious by definition that the limit of a sequence 
of kernels remains a kernel [1] . 

Proposition 5.4 Let {Kj : j G N} and {Gj : j G N} be two sequences of kernels on X that converge 
pointwise to kernels K and G, respectively. If 'Hkj ^ ^Gj for all j G N and 

sup{A(Kj, Gj) : i G N} < +00 (5.1) 

then Uk ^ 'Hg and X{K, G) < sup{X{Kj, Gj) : j G N}. 

Proof: Suppose that ^ ^Gj for all i G N and A := sup{A(Kj,Gj) : j G N} < +00. Let x be a 
finite set of sampling points in X and y G C" be fixed. Then as Kj <^ XGj, we have for all j G N that 

y*(AG,[x]-i^,[x])y>0. 

Taking the limit as j —t- 00, we get that 

y*(AG[x]-/i[x])y>0. 

The proof is hence complete. □ 

We remark that condition (jS.ip may not be removed in the last proposition. For a simple con- 
tradictory example, we let G be an arbitrary nontrivial kernel on X and set Kj := jG and Gj := G 
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for all j G N. It is cleat that ^/^^ = T^Gj = T^G for each j G N. But the limit of Kj is the trivial 
kernel. The inclusion relation is hence not kept in the limit kernels. The reason is that X{Kj, Gj) = j 
is unbounded. 

With the help of Propositions 15. H 15.41 and Corollary 15.31 we are ready to give a main result of 
this section. We shall use a fact proved in [5] that if is a kernel and (j) is analytic with nonnegative 
Taylor coefficients at the origin then (j){K) remains a kernel. 

Theorem 5.5 Let K and G he two kernels on X with T-Lx Q T~Lg- Then T-L^^k Q T-L^\(k,g)g ■ In 
particular, if \{K,G) < 1 then H^k C H^g. 

Proof: We may assume that A(A', G) < 1. Let := E"=o ff ^nd G„ := YJj=o ff each n G N. 
Then Kn,G n converge pointwise to e^ and e^, respectively. It also follows from Proposition [STTI and 
Corollary 15.31 that 



Kn < max X(K,Gy G„. 

VO<j<n / 

It is clear that maxo<j<n X{K, Gy , n G N are bounded by 1. The result now follows immediately from 
Proposition 15.41 □ 

The arguments used in the above proof in fact are able to prove a more general result, which we 
present below. 

Proposition 5.6 Let K and G he two kernels on X with Tix C T-Lq- Suppose that (j) is an analytic 
function with nonnegative Taylor coefficients aj, j > at the origin. Then H^i^K) ^ TI-ip{\{K,G)G)- If> 
in addition, X{K,G) < 1, then H^f^K) ^ T~i(j)(G)- 

6 Equivalent Norm Inclusion 

In this section, we investigate a special inclusion relation where an equivalence on the norms on the 
smaller space is imposed. Specifically, for two kernels K, G on X, we denote by Hk ^ "Hg if TI-K CI T-Lq 
and there exists positive constants q, /3 such that 

aWfWn^ < WfWna < mWn^ for all / G TiK- (6.1) 

For an existing kernel K, we call a kernel G a weak refinement of K if T-Lk ^ ^G- This is a relaxation 
of the refinement kernel defined in |24j and is expected to accommodate more examples of reproducing 
kernels. 

We start our investigation with a characterization of the equivalent norm inclusion relation. The 
following result from [1] is needed. 

Lemma 6.1 Let K and G he kernels on X. Then there holds for all f G Hk+g that 
\\f\\n,+a = mm{||/i||l,,, + \\f2fna ■ / = /i + /2, /i e ^k, f2 G nc}- 

Theorem 6.2 Let K and G he kernels on X with T-Lk ^ T~Lg- Then T-Lk ^ T~Lg if ctnd only if there 
exists some constant 6 > such that 

Mnx(K,G)G~K ^ H^W-Hk for each eGTiRn 'Hx{k,g)G~k- (6-2) 
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Proof: For notational simplicity, put L := \{K,G)G — K. By Proposition 12.31 L is a kernel on X. 
Suppose that condition ()6.2p is satisfied. Note that for each / G Hk ^ T~iG with the decomposition 
/ = /1+/2 where /i G Uk, f2 G 'Hl, we have /2 S Uk'^'Hl- This together with Tix '^'Hg = ^\[k,g)G 
implies by Lemma |6. II that for all / € Hk 



/=/l+/2 

> min {||/i|||,^+52||^^||2^^ /i G T^K, /2 G ^l} 

/=/l+/2 

> min min{l,<^2||||^^||2^^ ^ ||_^^||2^^ . ^ ^ 

/=/l+/2 

> \u.in{l,6^}\\ff^^. 
Recall that for all / G Hg^ 

WfWna = VHK^)\\f\\x[K,G)G- 
By the above two equations and Proposition 12.31 we have for all / € 1-Lk that 

i=min{l,5}7A(:^||/||«,, < 11/11^^ < ./KK^)nK 

in other words, Hk ^ "^G- 

Conversely, suppose that Hk ^ but ()6.2|) does not hold for any 5 > 0. Then for each n G N, 
there exists gn G n such that 

„ „ 1„ „ / N 

llSnllWi < -IISnll^K- (6-3) 

Since L <C X{K,G)G, it follows from Lemmas 12.21 and 16.11 that T-Ll ^ ^\{k,g)G 



WgnWua = VW^\\9n\\H,,K,a)G < VW^\\9n\\H, for all n G N. (6.4) 
Equations ()6.3p and (|6.4p imply that 

bn ^ \\9n\\nK foi^ all n G N, 

n 

contradicting (j6.ip . The proof is complete. □ 
As an application of Theorem 16. 2^ we have the following example. 

Proposition 6.3 Consider the two finite polynomial kernels Kp,Kq defined by 7p and li4-8^ - Then 
T-l-Kp < T-Lxg if and only ifp<q. 

Proof: By Proposition 14.31 Tixp ^ T~iKq if and only ii p < q. Thus, if Tixp ^ T~iKq then p < q. Suppose 
that p < q. We introduce another kernel K on R*^ by setting 



K{x,y) ■.= J2(A{x,yy, x,yeR''. 



Then by Proposition 14. H Tix ^ ^A', and Hk = T~{-Kp- It is clear that Hk riTixq-K = {0}. By 
Theorem 16.21 Hk ^ T~(-Kq- As = Ti-Kp, we have T^i^-p < TiRq- The proof is complete. □ 
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Before moving on, we make a simple observation that if two kernels K,G on X satisfy Hk ^ 
and Hk 7^ then can not be dense in Tic- For instances, given two Gaussian kernels G^^, 
G-y^ with 71 < 72- As T-Lc-y^ — '^G-,^ and 'Hg-,^ is dense in but not equal to ^G-y^ ; ^71 is not a weak 
refinement of G-y^. 

The main purpose of this section is to present two characterizations of the equivalent norm inclusion 
that are widely applicable to translation invariant kernels and Hilbert-Schmidt kernels. As the study 
would be similar to that in [24], we shall omit the proof and examples. 

Let /X, u be two finite positive Borel measures on a topological space Y . Set 

^ + V \^ — u\ 



where \^ — v\ denotes the total variation measure oi ^ — v. Then ^ and v are absolutely continuous 
with respect to uj. Given a function : X x y — )• C such that (j){x, •) G L1j(Y) for all a; E X and 

spa3r{(/)(j;, •) : 2; e X} = L2(y), (6.5) 
we introduce two kernels K^, on X by setting 

Ki^{x,y) := {(j){x,-),ct){y,-))L2(Y), Ky{x,y) := {(t){x,-),(t){y,-))Li{Y), x,y e X. (6.6) 

Our task is to characterize the equivalent inclusion relation T-Lk ^ ^G in terms of the measures fi 
and ly. To this end, we write ^ < v \i ^ <^ u and there exist positive constants a, /? such that 
a < dfj^/du < P almost everywhere on {t G y : ^(t) > 0} with respect to u. 

The following characterization theorem can be proved by arguments similar to those in j24j . 

Theorem 6.4 Suppose that (j) : X x Y ^ C satisfies i6. 5|) and K^, Ky are defined by \6. 6]) . Then 
T-^-i-i ^ T-Lv if and only if < ■ 

The above theorem has a particular application to Hilbert-Schmidt kernels. For two nonnegative 
functions a,b on N, we denote by a < 6 if suppa C supp6 and there exist two positive constants a 
and /3 such that aa„ < bn ^ I^Un for each n G suppa. Here suppa := {n G N : a„ 7^ 0}. Recall the 
definition of Hilbert-Schmidt kernels (j4.2|) and ()4.3p through a sequence of functions (j4.ip . 

Proposition 6.5 Suppose that span {$(x) : X G X} is dense in both il{N) and £g(N). Then UKa < 
^/^b if and only if a <b. 

We want to reemphasize that our results, though similar to those in [24| for refinement of repro- 
ducing kernels, much increase the chance of refining an existing kernel. Taking polynomial kernels as 
an instance, for two such kernels 

TV M 

K{x,y) ■.= ^aj{x,yy, G{x,y) ■.= ^bk{x,yf , x,yeR'^, 

j=0 k=0 

where aj, b^ are positive constants. By Proposition 16.51 ^ T~^K^ ii N < M. However, asking Kf, to 
be a refinement kernel of Ka would impose a strong additional requirement that aj = bj for all j G N^v- 
A more concrete example is the kernels Kp, Kg appeared in (j4.7p and (j4.8p . By our discussion, ii p < q 
then Kg is a weak refinement but not a refinement of Kg. 
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